Overview

Brought to you by YData

Dataset statistics

Number of variables21
Number of observations2077964
Missing cells10579477
Missing cells (%)24.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.7 GiB
Average record size in memory885.6 B

Variable types

Numeric4
Text7
Categorical10

Alerts

cap-diameter is highly overall correlated with gill-spacing and 6 other fieldsHigh correlation
gill-spacing is highly overall correlated with cap-diameter and 1 other fieldsHigh correlation
spore-print-color is highly overall correlated with cap-diameter and 1 other fieldsHigh correlation
stem-height is highly overall correlated with cap-diameterHigh correlation
stem-root is highly overall correlated with cap-diameterHigh correlation
stem-width is highly overall correlated with cap-diameterHigh correlation
veil-color is highly overall correlated with cap-diameterHigh correlation
veil-type is highly overall correlated with cap-diameterHigh correlation
does-bruise-or-bleed is highly imbalanced (85.0%)Imbalance
gill-spacing is highly imbalanced (78.9%)Imbalance
stem-root is highly imbalanced (64.9%)Imbalance
veil-type is highly imbalanced (99.8%)Imbalance
veil-color is highly imbalanced (69.2%)Imbalance
has-ring is highly imbalanced (82.4%)Imbalance
ring-type is highly imbalanced (78.6%)Imbalance
spore-print-color is highly imbalanced (57.0%)Imbalance
habitat is highly imbalanced (71.7%)Imbalance
cap-surface has 446904 (21.5%) missing valuesMissing
gill-attachment has 349821 (16.8%) missing valuesMissing
gill-spacing has 839595 (40.4%) missing valuesMissing
stem-root has 1838012 (88.5%) missing valuesMissing
stem-surface has 1321488 (63.6%) missing valuesMissing
veil-type has 1971545 (94.9%) missing valuesMissing
veil-color has 1826124 (87.9%) missing valuesMissing
ring-type has 86195 (4.1%) missing valuesMissing
spore-print-color has 1899617 (91.4%) missing valuesMissing
id is uniformly distributedUniform
id has unique valuesUnique

Reproduction

Analysis started2025-12-11 09:35:19.585741
Analysis finished2025-12-11 09:36:36.133877
Duration1 minute and 16.55 seconds
Software versionydata-profiling vv4.17.0
Download configurationconfig.json

Variables

id
Real number (ℝ)

Uniform  Unique 

Distinct2077964
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4155926.5
Minimum3116945
Maximum5194908
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.9 MiB
2025-12-11T10:36:36.191023image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum3116945
5-th percentile3220843.1
Q13636435.8
median4155926.5
Q34675417.2
95-th percentile5091009.8
Maximum5194908
Range2077963
Interquartile range (IQR)1038981.5

Descriptive statistics

Standard deviation599856.68
Coefficient of variation (CV)0.14433765
Kurtosis-1.2
Mean4155926.5
Median Absolute Deviation (MAD)519491
Skewness2.12746 × 10-16
Sum8.6358657 × 1012
Variance3.5982804 × 1011
MonotonicityStrictly increasing
2025-12-11T10:36:36.278776image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
31169451
 
< 0.1%
45022501
 
< 0.1%
45022631
 
< 0.1%
45022621
 
< 0.1%
45022611
 
< 0.1%
45022601
 
< 0.1%
45022591
 
< 0.1%
45022581
 
< 0.1%
45022571
 
< 0.1%
45022561
 
< 0.1%
Other values (2077954)2077954
> 99.9%
ValueCountFrequency (%)
31169451
< 0.1%
31169461
< 0.1%
31169471
< 0.1%
31169481
< 0.1%
31169491
< 0.1%
31169501
< 0.1%
31169511
< 0.1%
31169521
< 0.1%
31169531
< 0.1%
31169541
< 0.1%
ValueCountFrequency (%)
51949081
< 0.1%
51949071
< 0.1%
51949061
< 0.1%
51949051
< 0.1%
51949041
< 0.1%
51949031
< 0.1%
51949021
< 0.1%
51949011
< 0.1%
51949001
< 0.1%
51948991
< 0.1%

cap-diameter
Real number (ℝ)

High correlation 

Distinct3745
Distinct (%)0.2%
Missing7
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean6.3061922
Minimum0
Maximum607
Zeros2
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size15.9 MiB
2025-12-11T10:36:36.364053image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1.34
Q13.31
median5.74
Q38.23
95-th percentile13.23
Maximum607
Range607
Interquartile range (IQR)4.92

Descriptive statistics

Standard deviation4.6854624
Coefficient of variation (CV)0.74299391
Kurtosis162.32193
Mean6.3061922
Median Absolute Deviation (MAD)2.46
Skewness4.9571607
Sum13103996
Variance21.953558
MonotonicityNot monotonic
2025-12-11T10:36:36.453350image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.495474
 
0.3%
3.185174
 
0.2%
3.144956
 
0.2%
1.514720
 
0.2%
3.284577
 
0.2%
3.244517
 
0.2%
2.874475
 
0.2%
4.044443
 
0.2%
3.854382
 
0.2%
3.454357
 
0.2%
Other values (3735)2030882
97.7%
ValueCountFrequency (%)
02
 
< 0.1%
0.021
 
< 0.1%
0.032
 
< 0.1%
0.44
 
< 0.1%
0.447
 
< 0.1%
0.454
 
< 0.1%
0.468
 
< 0.1%
0.4738
< 0.1%
0.4826
< 0.1%
0.4923
< 0.1%
ValueCountFrequency (%)
6071
< 0.1%
73.361
< 0.1%
681
< 0.1%
64.721
< 0.1%
62.342
< 0.1%
62.331
< 0.1%
62.321
< 0.1%
62.311
< 0.1%
62.31
< 0.1%
62.11
< 0.1%
Distinct62
Distinct (%)< 0.1%
Missing31
Missing (%)< 0.1%
Memory size114.9 MiB
2025-12-11T10:36:36.568737image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length12
Median length1
Mean length1.0000611
Min length1

Characters and Unicode

Total characters2078060
Distinct characters36
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)< 0.1%

Sample

1st rowx
2nd rowo
3rd rowb
4th rowx
5th rowx
ValueCountFrequency (%)
x957949
46.1%
f452364
21.8%
s242698
 
11.7%
b211879
 
10.2%
o71972
 
3.5%
p71303
 
3.4%
c69436
 
3.3%
e33
 
< 0.1%
d30
 
< 0.1%
t28
 
< 0.1%
Other values (51)245
 
< 0.1%
2025-12-11T10:36:36.733200image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
x957949
46.1%
f452364
21.8%
s242700
 
11.7%
b211879
 
10.2%
o71972
 
3.5%
p71305
 
3.4%
c69437
 
3.3%
e36
 
< 0.1%
.32
 
< 0.1%
d31
 
< 0.1%
Other values (26)355
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)2078060
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
x957949
46.1%
f452364
21.8%
s242700
 
11.7%
b211879
 
10.2%
o71972
 
3.5%
p71305
 
3.4%
c69437
 
3.3%
e36
 
< 0.1%
.32
 
< 0.1%
d31
 
< 0.1%
Other values (26)355
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)2078060
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
x957949
46.1%
f452364
21.8%
s242700
 
11.7%
b211879
 
10.2%
o71972
 
3.5%
p71305
 
3.4%
c69437
 
3.3%
e36
 
< 0.1%
.32
 
< 0.1%
d31
 
< 0.1%
Other values (26)355
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)2078060
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
x957949
46.1%
f452364
21.8%
s242700
 
11.7%
b211879
 
10.2%
o71972
 
3.5%
p71305
 
3.4%
c69437
 
3.3%
e36
 
< 0.1%
.32
 
< 0.1%
d31
 
< 0.1%
Other values (26)355
 
< 0.1%

cap-surface
Text

Missing 

Distinct59
Distinct (%)< 0.1%
Missing446904
Missing (%)21.5%
Memory size103.9 MiB
2025-12-11T10:36:36.812956image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length17
Median length1
Mean length1.0001177
Min length1

Characters and Unicode

Total characters1631252
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)< 0.1%

Sample

1st rowt
2nd rowg
3rd rowt
4th rowh
5th rowh
ValueCountFrequency (%)
t306852
18.8%
s257190
15.8%
y218336
13.4%
h189737
11.6%
g176140
10.8%
d137675
8.4%
k86041
 
5.3%
e79918
 
4.9%
i75570
 
4.6%
w73109
 
4.5%
Other values (50)30494
 
1.9%
2025-12-11T10:36:36.984891image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t306857
18.8%
s257204
15.8%
y218336
13.4%
h189739
11.6%
g176141
10.8%
d137677
8.4%
k86041
 
5.3%
e79928
 
4.9%
i75575
 
4.6%
w73109
 
4.5%
Other values (27)30645
 
1.9%

Most occurring categories

ValueCountFrequency (%)
(unknown)1631252
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t306857
18.8%
s257204
15.8%
y218336
13.4%
h189739
11.6%
g176141
10.8%
d137677
8.4%
k86041
 
5.3%
e79928
 
4.9%
i75575
 
4.6%
w73109
 
4.5%
Other values (27)30645
 
1.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown)1631252
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t306857
18.8%
s257204
15.8%
y218336
13.4%
h189739
11.6%
g176141
10.8%
d137677
8.4%
k86041
 
5.3%
e79928
 
4.9%
i75575
 
4.6%
w73109
 
4.5%
Other values (27)30645
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown)1631252
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t306857
18.8%
s257204
15.8%
y218336
13.4%
h189739
11.6%
g176141
10.8%
d137677
8.4%
k86041
 
5.3%
e79928
 
4.9%
i75575
 
4.6%
w73109
 
4.5%
Other values (27)30645
 
1.9%
Distinct57
Distinct (%)< 0.1%
Missing13
Missing (%)< 0.1%
Memory size114.9 MiB
2025-12-11T10:36:37.070663image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length88
Median length1
Mean length1.0001251
Min length1

Characters and Unicode

Total characters2078211
Distinct characters35
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique30 ?
Unique (%)< 0.1%

Sample

1st rown
2nd rowo
3rd rown
4th rown
5th rowy
ValueCountFrequency (%)
n904307
43.5%
y259062
 
12.5%
w253844
 
12.2%
g140681
 
6.8%
e131524
 
6.3%
o119764
 
5.8%
p61186
 
2.9%
r51784
 
2.5%
u48865
 
2.4%
b40790
 
2.0%
Other values (47)66144
 
3.2%
2025-12-11T10:36:37.232426image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n904316
43.5%
y259067
 
12.5%
w253844
 
12.2%
g140688
 
6.8%
e131531
 
6.3%
o119768
 
5.8%
p61192
 
2.9%
r51792
 
2.5%
u48865
 
2.4%
b40791
 
2.0%
Other values (25)66357
 
3.2%

Most occurring categories

ValueCountFrequency (%)
(unknown)2078211
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n904316
43.5%
y259067
 
12.5%
w253844
 
12.2%
g140688
 
6.8%
e131531
 
6.3%
o119768
 
5.8%
p61192
 
2.9%
r51792
 
2.5%
u48865
 
2.4%
b40791
 
2.0%
Other values (25)66357
 
3.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown)2078211
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n904316
43.5%
y259067
 
12.5%
w253844
 
12.2%
g140688
 
6.8%
e131531
 
6.3%
o119768
 
5.8%
p61192
 
2.9%
r51792
 
2.5%
u48865
 
2.4%
b40791
 
2.0%
Other values (25)66357
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown)2078211
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n904316
43.5%
y259067
 
12.5%
w253844
 
12.2%
g140688
 
6.8%
e131531
 
6.3%
o119768
 
5.8%
p61192
 
2.9%
r51792
 
2.5%
u48865
 
2.4%
b40791
 
2.0%
Other values (25)66357
 
3.2%

does-bruise-or-bleed
Categorical

Imbalance 

Distinct22
Distinct (%)< 0.1%
Missing10
Missing (%)< 0.1%
Memory size114.9 MiB
f
1713662 
t
364227 
x
 
11
w
 
7
s
 
7
Other values (17)
 
40

Length

Max length6
Median length1
Mean length1.0000048
Min length1

Characters and Unicode

Total characters2077964
Distinct characters22
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowt
2nd rowf
3rd rowf
4th rowf
5th rowf

Common Values

ValueCountFrequency (%)
f1713662
82.5%
t364227
 
17.5%
x11
 
< 0.1%
w7
 
< 0.1%
s7
 
< 0.1%
p5
 
< 0.1%
n4
 
< 0.1%
h4
 
< 0.1%
k3
 
< 0.1%
o3
 
< 0.1%
Other values (12)21
 
< 0.1%
(Missing)10
 
< 0.1%

Length

2025-12-11T10:36:37.312455image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
f1713662
82.5%
t364228
 
17.5%
x11
 
< 0.1%
w7
 
< 0.1%
s7
 
< 0.1%
p5
 
< 0.1%
n4
 
< 0.1%
h4
 
< 0.1%
c3
 
< 0.1%
g3
 
< 0.1%
Other values (12)21
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
f1713662
82.5%
t364228
 
17.5%
x11
 
< 0.1%
s10
 
< 0.1%
w7
 
< 0.1%
o5
 
< 0.1%
e5
 
< 0.1%
n5
 
< 0.1%
p5
 
< 0.1%
h4
 
< 0.1%
Other values (12)22
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)2077964
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
f1713662
82.5%
t364228
 
17.5%
x11
 
< 0.1%
s10
 
< 0.1%
w7
 
< 0.1%
o5
 
< 0.1%
e5
 
< 0.1%
n5
 
< 0.1%
p5
 
< 0.1%
h4
 
< 0.1%
Other values (12)22
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)2077964
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
f1713662
82.5%
t364228
 
17.5%
x11
 
< 0.1%
s10
 
< 0.1%
w7
 
< 0.1%
o5
 
< 0.1%
e5
 
< 0.1%
n5
 
< 0.1%
p5
 
< 0.1%
h4
 
< 0.1%
Other values (12)22
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)2077964
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
f1713662
82.5%
t364228
 
17.5%
x11
 
< 0.1%
s10
 
< 0.1%
w7
 
< 0.1%
o5
 
< 0.1%
e5
 
< 0.1%
n5
 
< 0.1%
p5
 
< 0.1%
h4
 
< 0.1%
Other values (12)22
 
< 0.1%

gill-attachment
Text

Missing 

Distinct66
Distinct (%)< 0.1%
Missing349821
Missing (%)16.8%
Memory size106.3 MiB
2025-12-11T10:36:37.411677image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length17
Median length1
Mean length1.0001007
Min length1

Characters and Unicode

Total characters1728317
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique41 ?
Unique (%)< 0.1%

Sample

1st rows
2nd rowp
3rd rowx
4th rowp
5th rowf
ValueCountFrequency (%)
a430960
24.9%
d392584
22.7%
x240758
13.9%
e201277
11.6%
s196280
11.4%
p186334
10.8%
f79630
 
4.6%
c53
 
< 0.1%
u35
 
< 0.1%
t27
 
< 0.1%
Other values (56)207
 
< 0.1%
2025-12-11T10:36:37.599555image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a430966
24.9%
d392584
22.7%
x240758
13.9%
e201282
11.6%
s196293
11.4%
p186336
10.8%
f79630
 
4.6%
c56
 
< 0.1%
.37
 
< 0.1%
u35
 
< 0.1%
Other values (27)340
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)1728317
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a430966
24.9%
d392584
22.7%
x240758
13.9%
e201282
11.6%
s196293
11.4%
p186336
10.8%
f79630
 
4.6%
c56
 
< 0.1%
.37
 
< 0.1%
u35
 
< 0.1%
Other values (27)340
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)1728317
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a430966
24.9%
d392584
22.7%
x240758
13.9%
e201282
11.6%
s196293
11.4%
p186336
10.8%
f79630
 
4.6%
c56
 
< 0.1%
.37
 
< 0.1%
u35
 
< 0.1%
Other values (27)340
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)1728317
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a430966
24.9%
d392584
22.7%
x240758
13.9%
e201282
11.6%
s196293
11.4%
p186336
10.8%
f79630
 
4.6%
c56
 
< 0.1%
.37
 
< 0.1%
u35
 
< 0.1%
Other values (27)340
 
< 0.1%

gill-spacing
Categorical

High correlation  Imbalance  Missing 

Distinct35
Distinct (%)< 0.1%
Missing839595
Missing (%)40.4%
Memory size113.3 MiB
c
886976 
d
272085 
f
 
79223
e
 
11
s
 
10
Other values (30)
 
64

Length

Max length9
Median length1
Mean length1.0000485
Min length1

Characters and Unicode

Total characters1238429
Distinct characters33
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21 ?
Unique (%)< 0.1%

Sample

1st rowc
2nd rowc
3rd rowc
4th rowc
5th rowf

Common Values

ValueCountFrequency (%)
c886976
42.7%
d272085
 
13.1%
f79223
 
3.8%
e11
 
< 0.1%
s10
 
< 0.1%
a10
 
< 0.1%
b7
 
< 0.1%
x6
 
< 0.1%
p5
 
< 0.1%
y4
 
< 0.1%
Other values (25)32
 
< 0.1%
(Missing)839595
40.4%

Length

2025-12-11T10:36:37.687454image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c886977
71.6%
d272085
 
22.0%
f79224
 
6.4%
e11
 
< 0.1%
s10
 
< 0.1%
a10
 
< 0.1%
b7
 
< 0.1%
x6
 
< 0.1%
p5
 
< 0.1%
y4
 
< 0.1%
Other values (25)33
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
c886977
71.6%
d272087
 
22.0%
f79224
 
6.4%
e14
 
< 0.1%
.14
 
< 0.1%
s13
 
< 0.1%
a11
 
< 0.1%
67
 
< 0.1%
b7
 
< 0.1%
17
 
< 0.1%
Other values (23)68
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)1238429
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
c886977
71.6%
d272087
 
22.0%
f79224
 
6.4%
e14
 
< 0.1%
.14
 
< 0.1%
s13
 
< 0.1%
a11
 
< 0.1%
67
 
< 0.1%
b7
 
< 0.1%
17
 
< 0.1%
Other values (23)68
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)1238429
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
c886977
71.6%
d272087
 
22.0%
f79224
 
6.4%
e14
 
< 0.1%
.14
 
< 0.1%
s13
 
< 0.1%
a11
 
< 0.1%
67
 
< 0.1%
b7
 
< 0.1%
17
 
< 0.1%
Other values (23)68
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)1238429
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
c886977
71.6%
d272087
 
22.0%
f79224
 
6.4%
e14
 
< 0.1%
.14
 
< 0.1%
s13
 
< 0.1%
a11
 
< 0.1%
67
 
< 0.1%
b7
 
< 0.1%
17
 
< 0.1%
Other values (23)68
 
< 0.1%
Distinct56
Distinct (%)< 0.1%
Missing49
Missing (%)< 0.1%
Memory size114.9 MiB
2025-12-11T10:36:37.775495image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length20
Median length1
Mean length1.0001107
Min length1

Characters and Unicode

Total characters2078145
Distinct characters36
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique27 ?
Unique (%)< 0.1%

Sample

1st roww
2nd rowy
3rd rown
4th rown
5th rowy
ValueCountFrequency (%)
w620774
29.9%
n362169
17.4%
y313932
15.1%
p229155
 
11.0%
g141520
 
6.8%
o105048
 
5.1%
k85360
 
4.1%
f79483
 
3.8%
r41499
 
2.0%
e37432
 
1.8%
Other values (46)61546
 
3.0%
2025-12-11T10:36:37.943935image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
w620774
29.9%
n362180
17.4%
y313933
15.1%
p229162
 
11.0%
g141527
 
6.8%
o105061
 
5.1%
k85360
 
4.1%
f79483
 
3.8%
r41511
 
2.0%
e37450
 
1.8%
Other values (26)61704
 
3.0%

Most occurring categories

ValueCountFrequency (%)
(unknown)2078145
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
w620774
29.9%
n362180
17.4%
y313933
15.1%
p229162
 
11.0%
g141527
 
6.8%
o105061
 
5.1%
k85360
 
4.1%
f79483
 
3.8%
r41511
 
2.0%
e37450
 
1.8%
Other values (26)61704
 
3.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown)2078145
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
w620774
29.9%
n362180
17.4%
y313933
15.1%
p229162
 
11.0%
g141527
 
6.8%
o105061
 
5.1%
k85360
 
4.1%
f79483
 
3.8%
r41511
 
2.0%
e37450
 
1.8%
Other values (26)61704
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown)2078145
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
w620774
29.9%
n362180
17.4%
y313933
15.1%
p229162
 
11.0%
g141527
 
6.8%
o105061
 
5.1%
k85360
 
4.1%
f79483
 
3.8%
r41511
 
2.0%
e37450
 
1.8%
Other values (26)61704
 
3.0%

stem-height
Real number (ℝ)

High correlation 

Distinct2664
Distinct (%)0.1%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean6.3465092
Minimum0
Maximum57.29
Zeros333
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size15.9 MiB
2025-12-11T10:36:38.028570image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3.16
Q14.67
median5.88
Q37.41
95-th percentile11.2
Maximum57.29
Range57.29
Interquartile range (IQR)2.74

Descriptive statistics

Standard deviation2.6989778
Coefficient of variation (CV)0.42526966
Kurtosis7.4591835
Mean6.3465092
Median Absolute Deviation (MAD)1.33
Skewness1.9219513
Sum13187811
Variance7.2844812
MonotonicityNot monotonic
2025-12-11T10:36:38.118442image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5.248164
 
0.4%
5.927818
 
0.4%
5.327242
 
0.3%
5.356982
 
0.3%
6.036943
 
0.3%
5.996941
 
0.3%
5.546768
 
0.3%
5.966674
 
0.3%
5.776643
 
0.3%
5.656641
 
0.3%
Other values (2654)2007147
96.6%
ValueCountFrequency (%)
0333
< 0.1%
0.921
 
< 0.1%
0.973
 
< 0.1%
12
 
< 0.1%
1.011
 
< 0.1%
1.095
 
< 0.1%
1.17
 
< 0.1%
1.116
 
< 0.1%
1.1227
 
< 0.1%
1.1310
 
< 0.1%
ValueCountFrequency (%)
57.291
< 0.1%
53.242
< 0.1%
50.421
< 0.1%
50.41
< 0.1%
50.221
< 0.1%
48.571
< 0.1%
48.31
< 0.1%
47.941
< 0.1%
47.821
< 0.1%
46.931
< 0.1%

stem-width
Real number (ℝ)

High correlation 

Distinct5610
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.148374
Minimum0
Maximum102.91
Zeros324
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size15.9 MiB
2025-12-11T10:36:38.209537image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1.58
Q14.97
median9.64
Q315.62
95-th percentile26.49
Maximum102.91
Range102.91
Interquartile range (IQR)10.65

Descriptive statistics

Standard deviation8.1001812
Coefficient of variation (CV)0.72657961
Kurtosis2.5675929
Mean11.148374
Median Absolute Deviation (MAD)5.23
Skewness1.2493412
Sum23165920
Variance65.612936
MonotonicityNot monotonic
2025-12-11T10:36:38.364462image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2.415178
 
0.2%
2.454837
 
0.2%
2.494664
 
0.2%
2.564615
 
0.2%
2.474565
 
0.2%
2.524473
 
0.2%
2.514422
 
0.2%
2.644350
 
0.2%
2.64165
 
0.2%
2.614120
 
0.2%
Other values (5600)2032575
97.8%
ValueCountFrequency (%)
0324
< 0.1%
0.41
 
< 0.1%
0.452
 
< 0.1%
0.461
 
< 0.1%
0.471
 
< 0.1%
0.481
 
< 0.1%
0.491
 
< 0.1%
0.52
 
< 0.1%
0.511
 
< 0.1%
0.5211
 
< 0.1%
ValueCountFrequency (%)
102.911
< 0.1%
102.571
< 0.1%
102.482
< 0.1%
101.721
< 0.1%
101.692
< 0.1%
981
< 0.1%
95.991
< 0.1%
95.961
< 0.1%
95.681
< 0.1%
95.181
< 0.1%

stem-root
Categorical

High correlation  Imbalance  Missing 

Distinct31
Distinct (%)< 0.1%
Missing1838012
Missing (%)88.5%
Memory size111.4 MiB
b
110581 
s
78253 
r
31606 
c
19025 
f
 
374
Other values (26)
 
113

Length

Max length5
Median length1
Mean length1.0001125
Min length1

Characters and Unicode

Total characters239979
Distinct characters33
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)< 0.1%

Sample

1st rowb
2nd rowb
3rd rowb
4th rows
5th rowb

Common Values

ValueCountFrequency (%)
b110581
 
5.3%
s78253
 
3.8%
r31606
 
1.5%
c19025
 
0.9%
f374
 
< 0.1%
y14
 
< 0.1%
g14
 
< 0.1%
p11
 
< 0.1%
u8
 
< 0.1%
d7
 
< 0.1%
Other values (21)59
 
< 0.1%
(Missing)1838012
88.5%

Length

2025-12-11T10:36:38.455030image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
b110581
46.1%
s78253
32.6%
r31606
 
13.2%
c19025
 
7.9%
f374
 
0.2%
y14
 
< 0.1%
g14
 
< 0.1%
p11
 
< 0.1%
u8
 
< 0.1%
d7
 
< 0.1%
Other values (21)59
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
b110581
46.1%
s78253
32.6%
r31606
 
13.2%
c19025
 
7.9%
f374
 
0.2%
y14
 
< 0.1%
g14
 
< 0.1%
p11
 
< 0.1%
u8
 
< 0.1%
.8
 
< 0.1%
Other values (23)85
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)239979
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
b110581
46.1%
s78253
32.6%
r31606
 
13.2%
c19025
 
7.9%
f374
 
0.2%
y14
 
< 0.1%
g14
 
< 0.1%
p11
 
< 0.1%
u8
 
< 0.1%
.8
 
< 0.1%
Other values (23)85
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)239979
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
b110581
46.1%
s78253
32.6%
r31606
 
13.2%
c19025
 
7.9%
f374
 
0.2%
y14
 
< 0.1%
g14
 
< 0.1%
p11
 
< 0.1%
u8
 
< 0.1%
.8
 
< 0.1%
Other values (23)85
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)239979
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
b110581
46.1%
s78253
32.6%
r31606
 
13.2%
c19025
 
7.9%
f374
 
0.2%
y14
 
< 0.1%
g14
 
< 0.1%
p11
 
< 0.1%
u8
 
< 0.1%
.8
 
< 0.1%
Other values (23)85
 
< 0.1%

stem-surface
Text

Missing 

Distinct54
Distinct (%)< 0.1%
Missing1321488
Missing (%)63.6%
Memory size82.2 MiB
2025-12-11T10:36:38.522838image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length20
Median length1
Mean length1.0002485
Min length1

Characters and Unicode

Total characters756664
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique28 ?
Unique (%)< 0.1%

Sample

1st rows
2nd rowg
3rd rowy
4th rowy
5th rowg
ValueCountFrequency (%)
s218496
28.9%
y169462
22.4%
i149192
19.7%
t98982
13.1%
g51848
 
6.9%
k49075
 
6.5%
h18861
 
2.5%
f310
 
< 0.1%
d41
 
< 0.1%
w37
 
< 0.1%
Other values (45)177
 
< 0.1%
2025-12-11T10:36:38.683093image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
s218516
28.9%
y169462
22.4%
i149194
19.7%
t98983
13.1%
g51848
 
6.9%
k49075
 
6.5%
h18861
 
2.5%
f310
 
< 0.1%
d49
 
< 0.1%
e44
 
< 0.1%
Other values (27)322
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)756664
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
s218516
28.9%
y169462
22.4%
i149194
19.7%
t98983
13.1%
g51848
 
6.9%
k49075
 
6.5%
h18861
 
2.5%
f310
 
< 0.1%
d49
 
< 0.1%
e44
 
< 0.1%
Other values (27)322
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)756664
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
s218516
28.9%
y169462
22.4%
i149194
19.7%
t98983
13.1%
g51848
 
6.9%
k49075
 
6.5%
h18861
 
2.5%
f310
 
< 0.1%
d49
 
< 0.1%
e44
 
< 0.1%
Other values (27)322
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)756664
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
s218516
28.9%
y169462
22.4%
i149194
19.7%
t98983
13.1%
g51848
 
6.9%
k49075
 
6.5%
h18861
 
2.5%
f310
 
< 0.1%
d49
 
< 0.1%
e44
 
< 0.1%
Other values (27)322
 
< 0.1%
Distinct55
Distinct (%)< 0.1%
Missing21
Missing (%)< 0.1%
Memory size114.9 MiB
2025-12-11T10:36:38.740978image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length20
Median length1
Mean length1.0000895
Min length1

Characters and Unicode

Total characters2078129
Distinct characters36
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)< 0.1%

Sample

1st roww
2nd rown
3rd rown
4th roww
5th rowy
ValueCountFrequency (%)
w797365
38.4%
n668156
32.2%
y250141
 
12.0%
g88202
 
4.2%
o75094
 
3.6%
e68804
 
3.3%
u44784
 
2.2%
p36356
 
1.7%
k22176
 
1.1%
r14910
 
0.7%
Other values (45)11955
 
0.6%
2025-12-11T10:36:38.889614image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
w797365
38.4%
n668161
32.2%
y250143
 
12.0%
g88203
 
4.2%
o75110
 
3.6%
e68817
 
3.3%
u44785
 
2.2%
p36364
 
1.7%
k22176
 
1.1%
r14924
 
0.7%
Other values (26)12081
 
0.6%

Most occurring categories

ValueCountFrequency (%)
(unknown)2078129
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
w797365
38.4%
n668161
32.2%
y250143
 
12.0%
g88203
 
4.2%
o75110
 
3.6%
e68817
 
3.3%
u44785
 
2.2%
p36364
 
1.7%
k22176
 
1.1%
r14924
 
0.7%
Other values (26)12081
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown)2078129
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
w797365
38.4%
n668161
32.2%
y250143
 
12.0%
g88203
 
4.2%
o75110
 
3.6%
e68817
 
3.3%
u44785
 
2.2%
p36364
 
1.7%
k22176
 
1.1%
r14924
 
0.7%
Other values (26)12081
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown)2078129
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
w797365
38.4%
n668161
32.2%
y250143
 
12.0%
g88203
 
4.2%
o75110
 
3.6%
e68817
 
3.3%
u44785
 
2.2%
p36364
 
1.7%
k22176
 
1.1%
r14924
 
0.7%
Other values (26)12081
 
0.6%

veil-type
Categorical

High correlation  Imbalance  Missing 

Distinct15
Distinct (%)< 0.1%
Missing1971545
Missing (%)94.9%
Memory size111.2 MiB
u
106373 
w
 
12
e
 
6
k
 
5
y
 
3
Other values (10)
 
20

Length

Max length2
Median length1
Mean length1.0000094
Min length1

Characters and Unicode

Total characters106420
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowu
2nd rowu
3rd rowu
4th rowu
5th rowu

Common Values

ValueCountFrequency (%)
u106373
 
5.1%
w12
 
< 0.1%
e6
 
< 0.1%
k5
 
< 0.1%
y3
 
< 0.1%
p3
 
< 0.1%
g3
 
< 0.1%
s3
 
< 0.1%
n3
 
< 0.1%
a2
 
< 0.1%
Other values (5)6
 
< 0.1%
(Missing)1971545
94.9%

Length

2025-12-11T10:36:38.970661image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
u106373
> 99.9%
w12
 
< 0.1%
e6
 
< 0.1%
k5
 
< 0.1%
y3
 
< 0.1%
p3
 
< 0.1%
g3
 
< 0.1%
s3
 
< 0.1%
n3
 
< 0.1%
a2
 
< 0.1%
Other values (5)6
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
u106373
> 99.9%
w12
 
< 0.1%
e6
 
< 0.1%
k5
 
< 0.1%
y3
 
< 0.1%
p3
 
< 0.1%
g3
 
< 0.1%
s3
 
< 0.1%
n3
 
< 0.1%
a2
 
< 0.1%
Other values (5)7
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)106420
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
u106373
> 99.9%
w12
 
< 0.1%
e6
 
< 0.1%
k5
 
< 0.1%
y3
 
< 0.1%
p3
 
< 0.1%
g3
 
< 0.1%
s3
 
< 0.1%
n3
 
< 0.1%
a2
 
< 0.1%
Other values (5)7
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)106420
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
u106373
> 99.9%
w12
 
< 0.1%
e6
 
< 0.1%
k5
 
< 0.1%
y3
 
< 0.1%
p3
 
< 0.1%
g3
 
< 0.1%
s3
 
< 0.1%
n3
 
< 0.1%
a2
 
< 0.1%
Other values (5)7
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)106420
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
u106373
> 99.9%
w12
 
< 0.1%
e6
 
< 0.1%
k5
 
< 0.1%
y3
 
< 0.1%
p3
 
< 0.1%
g3
 
< 0.1%
s3
 
< 0.1%
n3
 
< 0.1%
a2
 
< 0.1%
Other values (5)7
 
< 0.1%

veil-color
Categorical

High correlation  Imbalance  Missing 

Distinct23
Distinct (%)< 0.1%
Missing1826124
Missing (%)87.9%
Memory size111.5 MiB
w
186432 
y
20782 
n
20256 
u
 
9413
k
 
8706
Other values (18)
 
6251

Length

Max length4
Median length1
Mean length1.0000119
Min length1

Characters and Unicode

Total characters251843
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st roww
2nd rown
3rd rowy
4th roww
5th roww

Common Values

ValueCountFrequency (%)
w186432
 
9.0%
y20782
 
1.0%
n20256
 
1.0%
u9413
 
0.5%
k8706
 
0.4%
e6147
 
0.3%
g20
 
< 0.1%
p18
 
< 0.1%
t9
 
< 0.1%
d9
 
< 0.1%
Other values (13)48
 
< 0.1%
(Missing)1826124
87.9%

Length

2025-12-11T10:36:39.048982image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
w186432
74.0%
y20782
 
8.3%
n20256
 
8.0%
u9413
 
3.7%
k8706
 
3.5%
e6147
 
2.4%
g20
 
< 0.1%
p18
 
< 0.1%
t9
 
< 0.1%
d9
 
< 0.1%
Other values (13)48
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
w186432
74.0%
y20782
 
8.3%
n20256
 
8.0%
u9413
 
3.7%
k8706
 
3.5%
e6147
 
2.4%
g20
 
< 0.1%
p18
 
< 0.1%
t9
 
< 0.1%
d9
 
< 0.1%
Other values (16)51
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)251843
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
w186432
74.0%
y20782
 
8.3%
n20256
 
8.0%
u9413
 
3.7%
k8706
 
3.5%
e6147
 
2.4%
g20
 
< 0.1%
p18
 
< 0.1%
t9
 
< 0.1%
d9
 
< 0.1%
Other values (16)51
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)251843
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
w186432
74.0%
y20782
 
8.3%
n20256
 
8.0%
u9413
 
3.7%
k8706
 
3.5%
e6147
 
2.4%
g20
 
< 0.1%
p18
 
< 0.1%
t9
 
< 0.1%
d9
 
< 0.1%
Other values (16)51
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)251843
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
w186432
74.0%
y20782
 
8.3%
n20256
 
8.0%
u9413
 
3.7%
k8706
 
3.5%
e6147
 
2.4%
g20
 
< 0.1%
p18
 
< 0.1%
t9
 
< 0.1%
d9
 
< 0.1%
Other values (16)51
 
< 0.1%

has-ring
Categorical

Imbalance 

Distinct23
Distinct (%)< 0.1%
Missing19
Missing (%)< 0.1%
Memory size114.9 MiB
f
1578092 
t
499759 
e
 
14
r
 
11
c
 
9
Other values (18)
 
60

Length

Max length5
Median length1
Mean length1.0000019
Min length1

Characters and Unicode

Total characters2077949
Distinct characters27
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowt
2nd rowf
3rd rowf
4th rowt
5th rowt

Common Values

ValueCountFrequency (%)
f1578092
75.9%
t499759
 
24.1%
e14
 
< 0.1%
r11
 
< 0.1%
c9
 
< 0.1%
g9
 
< 0.1%
h8
 
< 0.1%
d5
 
< 0.1%
l5
 
< 0.1%
p5
 
< 0.1%
Other values (13)28
 
< 0.1%
(Missing)19
 
< 0.1%

Length

2025-12-11T10:36:39.136363image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
f1578092
75.9%
t499759
 
24.1%
e14
 
< 0.1%
r11
 
< 0.1%
c9
 
< 0.1%
g9
 
< 0.1%
h8
 
< 0.1%
d5
 
< 0.1%
l5
 
< 0.1%
p5
 
< 0.1%
Other values (13)28
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
f1578092
75.9%
t499759
 
24.1%
e14
 
< 0.1%
r11
 
< 0.1%
c9
 
< 0.1%
g9
 
< 0.1%
h8
 
< 0.1%
d5
 
< 0.1%
l5
 
< 0.1%
p5
 
< 0.1%
Other values (17)32
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)2077949
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
f1578092
75.9%
t499759
 
24.1%
e14
 
< 0.1%
r11
 
< 0.1%
c9
 
< 0.1%
g9
 
< 0.1%
h8
 
< 0.1%
d5
 
< 0.1%
l5
 
< 0.1%
p5
 
< 0.1%
Other values (17)32
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)2077949
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
f1578092
75.9%
t499759
 
24.1%
e14
 
< 0.1%
r11
 
< 0.1%
c9
 
< 0.1%
g9
 
< 0.1%
h8
 
< 0.1%
d5
 
< 0.1%
l5
 
< 0.1%
p5
 
< 0.1%
Other values (17)32
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)2077949
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
f1578092
75.9%
t499759
 
24.1%
e14
 
< 0.1%
r11
 
< 0.1%
c9
 
< 0.1%
g9
 
< 0.1%
h8
 
< 0.1%
d5
 
< 0.1%
l5
 
< 0.1%
p5
 
< 0.1%
Other values (17)32
 
< 0.1%

ring-type
Categorical

Imbalance  Missing 

Distinct36
Distinct (%)< 0.1%
Missing86195
Missing (%)4.1%
Memory size114.8 MiB
f
1650200 
e
 
80348
z
 
75917
l
 
48847
p
 
45654
Other values (31)
 
90803

Length

Max length17
Median length1
Mean length1.0000407
Min length1

Characters and Unicode

Total characters1991850
Distinct characters35
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)< 0.1%

Sample

1st rowg
2nd rowf
3rd rowf
4th rowz
5th rowr

Common Values

ValueCountFrequency (%)
f1650200
79.4%
e80348
 
3.9%
z75917
 
3.7%
l48847
 
2.4%
p45654
 
2.2%
r45400
 
2.2%
g42472
 
2.0%
m2689
 
0.1%
t46
 
< 0.1%
d24
 
< 0.1%
Other values (26)172
 
< 0.1%
(Missing)86195
 
4.1%

Length

2025-12-11T10:36:39.220679image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
f1650201
82.9%
e80348
 
4.0%
z75917
 
3.8%
l48847
 
2.5%
p45655
 
2.3%
r45400
 
2.3%
g42472
 
2.1%
m2689
 
0.1%
t46
 
< 0.1%
d24
 
< 0.1%
Other values (26)172
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
f1650201
82.8%
e80355
 
4.0%
z75917
 
3.8%
l48848
 
2.5%
p45661
 
2.3%
r45407
 
2.3%
g42476
 
2.1%
m2689
 
0.1%
t51
 
< 0.1%
d25
 
< 0.1%
Other values (25)220
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)1991850
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
f1650201
82.8%
e80355
 
4.0%
z75917
 
3.8%
l48848
 
2.5%
p45661
 
2.3%
r45407
 
2.3%
g42476
 
2.1%
m2689
 
0.1%
t51
 
< 0.1%
d25
 
< 0.1%
Other values (25)220
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)1991850
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
f1650201
82.8%
e80355
 
4.0%
z75917
 
3.8%
l48848
 
2.5%
p45661
 
2.3%
r45407
 
2.3%
g42476
 
2.1%
m2689
 
0.1%
t51
 
< 0.1%
d25
 
< 0.1%
Other values (25)220
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)1991850
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
f1650201
82.8%
e80355
 
4.0%
z75917
 
3.8%
l48848
 
2.5%
p45661
 
2.3%
r45407
 
2.3%
g42476
 
2.1%
m2689
 
0.1%
t51
 
< 0.1%
d25
 
< 0.1%
Other values (25)220
 
< 0.1%

spore-print-color
Categorical

High correlation  Imbalance  Missing 

Distinct33
Distinct (%)< 0.1%
Missing1899617
Missing (%)91.4%
Memory size111.3 MiB
k
71573 
p
45452 
w
33657 
n
15081 
r
 
5305
Other values (28)
7279 

Length

Max length10
Median length1
Mean length1.0002467
Min length1

Characters and Unicode

Total characters178391
Distinct characters34
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)< 0.1%

Sample

1st rowp
2nd rowk
3rd rowk
4th roww
5th rown

Common Values

ValueCountFrequency (%)
k71573
 
3.4%
p45452
 
2.2%
w33657
 
1.6%
n15081
 
0.7%
r5305
 
0.3%
u4845
 
0.2%
g2323
 
0.1%
y18
 
< 0.1%
f11
 
< 0.1%
s10
 
< 0.1%
Other values (23)72
 
< 0.1%
(Missing)1899617
91.4%

Length

2025-12-11T10:36:39.301057image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
k71573
40.1%
p45452
25.5%
w33657
18.9%
n15081
 
8.5%
r5305
 
3.0%
u4845
 
2.7%
g2323
 
1.3%
y18
 
< 0.1%
f11
 
< 0.1%
s10
 
< 0.1%
Other values (23)72
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
k71573
40.1%
p45453
25.5%
w33657
18.9%
n15082
 
8.5%
r5307
 
3.0%
u4845
 
2.7%
g2324
 
1.3%
y19
 
< 0.1%
f11
 
< 0.1%
e11
 
< 0.1%
Other values (24)109
 
0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)178391
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
k71573
40.1%
p45453
25.5%
w33657
18.9%
n15082
 
8.5%
r5307
 
3.0%
u4845
 
2.7%
g2324
 
1.3%
y19
 
< 0.1%
f11
 
< 0.1%
e11
 
< 0.1%
Other values (24)109
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)178391
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
k71573
40.1%
p45453
25.5%
w33657
18.9%
n15082
 
8.5%
r5307
 
3.0%
u4845
 
2.7%
g2324
 
1.3%
y19
 
< 0.1%
f11
 
< 0.1%
e11
 
< 0.1%
Other values (24)109
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)178391
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
k71573
40.1%
p45453
25.5%
w33657
18.9%
n15082
 
8.5%
r5307
 
3.0%
u4845
 
2.7%
g2324
 
1.3%
y19
 
< 0.1%
f11
 
< 0.1%
e11
 
< 0.1%
Other values (24)109
 
0.1%

habitat
Categorical

Imbalance 

Distinct39
Distinct (%)< 0.1%
Missing25
Missing (%)< 0.1%
Memory size114.9 MiB
d
1450420 
g
304300 
l
 
114458
m
 
101258
h
 
80032
Other values (34)
 
27471

Length

Max length17
Median length1
Mean length1.0000823
Min length1

Characters and Unicode

Total characters2078110
Distinct characters35
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)< 0.1%

Sample

1st rowd
2nd rowd
3rd rowd
4th rowd
5th rowd

Common Values

ValueCountFrequency (%)
d1450420
69.8%
g304300
 
14.6%
l114458
 
5.5%
m101258
 
4.9%
h80032
 
3.9%
w12324
 
0.6%
p11429
 
0.6%
u3434
 
0.2%
s39
 
< 0.1%
t35
 
< 0.1%
Other values (29)210
 
< 0.1%

Length

2025-12-11T10:36:39.377321image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
d1450420
69.8%
g304300
 
14.6%
l114458
 
5.5%
m101258
 
4.9%
h80032
 
3.9%
w12324
 
0.6%
p11429
 
0.6%
u3434
 
0.2%
s39
 
< 0.1%
t35
 
< 0.1%
Other values (29)210
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
d1450420
69.8%
g304303
 
14.6%
l114466
 
5.5%
m101258
 
4.9%
h80042
 
3.9%
w12324
 
0.6%
p11435
 
0.6%
u3434
 
0.2%
s55
 
< 0.1%
t52
 
< 0.1%
Other values (25)321
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)2078110
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
d1450420
69.8%
g304303
 
14.6%
l114466
 
5.5%
m101258
 
4.9%
h80042
 
3.9%
w12324
 
0.6%
p11435
 
0.6%
u3434
 
0.2%
s55
 
< 0.1%
t52
 
< 0.1%
Other values (25)321
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)2078110
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
d1450420
69.8%
g304303
 
14.6%
l114466
 
5.5%
m101258
 
4.9%
h80042
 
3.9%
w12324
 
0.6%
p11435
 
0.6%
u3434
 
0.2%
s55
 
< 0.1%
t52
 
< 0.1%
Other values (25)321
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)2078110
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
d1450420
69.8%
g304303
 
14.6%
l114466
 
5.5%
m101258
 
4.9%
h80042
 
3.9%
w12324
 
0.6%
p11435
 
0.6%
u3434
 
0.2%
s55
 
< 0.1%
t52
 
< 0.1%
Other values (25)321
 
< 0.1%

season
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size114.9 MiB
a
1029085 
u
768267 
w
185975 
s
 
94637

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2077964
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowa
2nd rowa
3rd rows
4th rowu
5th rowu

Common Values

ValueCountFrequency (%)
a1029085
49.5%
u768267
37.0%
w185975
 
8.9%
s94637
 
4.6%

Length

2025-12-11T10:36:39.454600image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-12-11T10:36:39.524376image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
a1029085
49.5%
u768267
37.0%
w185975
 
8.9%
s94637
 
4.6%

Most occurring characters

ValueCountFrequency (%)
a1029085
49.5%
u768267
37.0%
w185975
 
8.9%
s94637
 
4.6%

Most occurring categories

ValueCountFrequency (%)
(unknown)2077964
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a1029085
49.5%
u768267
37.0%
w185975
 
8.9%
s94637
 
4.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown)2077964
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a1029085
49.5%
u768267
37.0%
w185975
 
8.9%
s94637
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown)2077964
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a1029085
49.5%
u768267
37.0%
w185975
 
8.9%
s94637
 
4.6%

Interactions

2025-12-11T10:36:23.577577image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:36:21.456827image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:36:22.199729image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:36:22.859874image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:36:23.748580image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:36:21.704608image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:36:22.356817image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:36:23.035722image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:36:23.927526image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:36:21.877933image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:36:22.518742image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:36:23.213876image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:36:24.090918image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:36:22.036695image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:36:22.682649image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:36:23.396334image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Correlations

2025-12-11T10:36:39.585351image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
cap-diameterdoes-bruise-or-bleedgill-spacinghabitathas-ringidring-typeseasonspore-print-colorstem-heightstem-rootstem-widthveil-colorveil-type
cap-diameter1.0000.0001.0000.0000.000-0.0000.0000.0021.0000.5121.0000.8831.0001.000
does-bruise-or-bleed0.0001.0000.0400.0330.0090.0010.0470.0920.1870.0570.1100.1070.2520.000
gill-spacing1.0000.0401.0000.0410.0470.0010.0470.1550.5730.0520.2420.0930.1410.093
habitat0.0000.0330.0411.0000.0490.0000.0900.0800.2020.1130.1340.1040.1780.040
has-ring0.0000.0090.0470.0491.0000.0000.1900.0230.2000.0730.0920.0790.1230.000
id-0.0000.0010.0010.0000.0001.0000.0010.0000.0010.0000.000-0.0000.0030.000
ring-type0.0000.0470.0470.0900.1900.0011.0000.0700.2350.2100.1480.1230.1880.160
season0.0020.0920.1550.0800.0230.0000.0701.0000.2090.0390.1460.0730.1470.008
spore-print-color1.0000.1870.5730.2020.2000.0010.2350.2091.0000.1620.4740.3380.3410.174
stem-height0.5120.0570.0520.1130.0730.0000.2100.0390.1621.0000.1970.4490.1840.250
stem-root1.0000.1100.2420.1340.0920.0000.1480.1460.4740.1971.0000.2530.3270.000
stem-width0.8830.1070.0930.1040.079-0.0000.1230.0730.3380.4490.2531.0000.1960.005
veil-color1.0000.2520.1410.1780.1230.0030.1880.1470.3410.1840.3270.1961.0000.378
veil-type1.0000.0000.0930.0400.0000.0000.1600.0080.1740.2500.0000.0050.3781.000

Missing values

2025-12-11T10:36:24.856304image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
A simple visualization of nullity by column.
2025-12-11T10:36:27.168900image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-12-11T10:36:34.673395image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

idcap-diametercap-shapecap-surfacecap-colordoes-bruise-or-bleedgill-attachmentgill-spacinggill-colorstem-heightstem-widthstem-rootstem-surfacestem-colorveil-typeveil-colorhas-ringring-typespore-print-colorhabitatseason
031169458.64xNaNntNaNNaNw11.1317.12bNaNwuwtgNaNda
131169466.90otofNaNcy1.2710.75NaNNaNnNaNNaNffNaNda
231169472.00bgnfNaNcn6.183.14NaNNaNnNaNNaNffNaNds
331169483.47xtnfscn4.988.51NaNNaNwNaNntzNaNdu
431169496.17xhyfpNaNy6.7313.70NaNNaNyNaNytNaNNaNdu
531169504.43xhnfxcn5.365.50NaNsnNaNNaNtrNaNda
631169512.92xdnfpNaNe4.8310.27NaNNaNyNaNNaNffNaNda
731169522.59oNaNkffff2.7312.71NaNggNaNNaNffNaNda
831169534.13xtofacn5.366.59NaNyoNaNNaNtzNaNdw
9311695411.91febfNaNcb5.3220.20NaNNaNwNaNNaNtfNaNda
idcap-diametercap-shapecap-surfacecap-colordoes-bruise-or-bleedgill-attachmentgill-spacinggill-colorstem-heightstem-widthstem-rootstem-surfacestem-colorveil-typeveil-colorhas-ringring-typespore-print-colorhabitatseason
207795451948998.25xsyfadw8.0214.80NaNinNaNNaNtNaNwda
207795551949004.57ftnfadw6.494.48NaNknNaNNaNffNaNdu
207795651949018.65fNaNnfxcw6.5416.61NaNswNaNNaNffNaNdu
207795751949025.59fknfpNaNn3.765.34NaNknNaNNaNtfNaNdu
207795851949036.19fsbfxcw6.329.75NaNsnNaNNaNffNaNlw
207795951949040.88xgwfadw2.671.35NaNNaNeNaNNaNffNaNdu
207796051949053.12xswfdcw2.697.38NaNNaNwNaNNaNffNaNga
207796151949065.73xeefaNaNw6.169.74NaNNaNyNaNwtzNaNda
207796251949075.03bgnfadg6.003.46NaNsgNaNNaNffNaNda
2077963519490815.51fNaNwfdcy2.6917.71NaNNaNwNaNNaNffNaNdw